3574 results found.
Written
Corpus,
Language Type:
Monolingual
Languages:
English
Availability:
From Owner
License:
LDC
Size:
26.348 GByte Production Status:
Existing-used
Use:
Discourse
-
Paper title:Multi-Relational Script Learning for Discourse Relations
-
Paper track:Long/Textual Inference and Other Areas of Semantics
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | I-Ta Lee | English Gigaword Fifth Edition | /N |
Documentation:
None
Written
Corpus,
Language Type:
Monolingual
Languages:
English
Availability:
Freely Available
License:
MIT
Size:
110000 Production Status:
Existing-used
Use:
Textual Entailment and Paraphrasing
-
Paper title:HellaSwag: Can a Machine Really Finish Your Sentence?
-
Paper track:Long/Textual Inference and Other Areas of Semantics
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Rowan Zellers | SWAG | /N |
Documentation:
None
Written
Corpus,
Language Type:
Monolingual
Languages:
English
Availability:
Freely Available
License:
MIT
Size:
70000 entries Production Status:
Newly created-finished
Use:
Textual Entailment and Paraphrasing
-
Paper title:HellaSwag: Can a Machine Really Finish Your Sentence?
-
Paper track:Long/Textual Inference and Other Areas of Semantics
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Rowan Zellers | HellaSwag | /N |
Documentation:
None
Written
Corpus,
Language Type:
Bilingual
Languages:
English German
Availability:
Freely Available
License:
ODC-By
Size:
2002 entries Production Status:
Newly created-finished
Use:
Emotion Recognition/Generation
-
Paper title:Crowdsourcing and Validating Event-focused Emotion Corpora for German and English
-
Paper track:Short/Resources and Evaluation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Roman Klinger | deISEAR and enISEAR | /N |
Documentation:
see paper
Written
Corpus,
Language Type:
Monolingual
Languages:
English
Availability:
Freely Available
License:
Size:
26682 sentences Production Status:
Newly created-finished
Use:
Gender Information
-
Paper title:Women's Syntactic Resilience and Men's Grammatical Luck: Gender-Bias in Part-of-Speech Tagging and Dependency Parsing
-
Paper track:Short/Tagging, Chunking, Syntax and Parsing
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Aparna Garimella | Wall Street Journal (WSJ) | /N |
Documentation:
None
Treebank,
Language Type:
Monolingual
Languages:
English
Availability:
License:
Size:
None Production Status:
Use:
-
Paper title:Multilingual Constituency Parsing with Self-Attention and Pre-Training
-
Paper track:Short/Tagging, Chunking, Syntax and Parsing
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Nikita Kitaev | Penn Treebank | /N |
Documentation:
None
Written
Evaluation Data,
Language Type:
Bilingual
Languages:
English German
Availability:
Freely Available
License:
Size:
<2000 entries Production Status:
Existing-updated
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:Training Neural Machine Translation to Apply Terminology Constraints
-
Paper track:Short/Machine Translation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Georgiana Dinu | Terminology-annotated En-De parallel data | /N |
Documentation:
https://github.com/mtresearcher/terminology_dataset/blob/master/README.md
Written
Corpus,
Language Type:
Monolingual
Languages:
English
Availability:
Freely Available
License:
Size:
2.7 GByte Production Status:
Newly created-finished
Use:
Discourse
-
Paper title:DisSent: Learning Sentence Representations from Explicit Discourse Relations
-
Paper track:Long/Sentence-level semantics
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Allen Nie | DisSent Corpus | /N |
Documentation:
Documentation exists in English, publicly available on Github: https://github.com/windweller/DisExtract
Written
Corpus,
Language Type:
Monolingual
Languages:
English
Availability:
Freely Available
License:
Size:
25000 entries Production Status:
Newly created-finished
Use:
Dialogue
-
Paper title:Towards Empathetic Open-domain Conversation Models: A New Benchmark and Dataset
-
Paper track:Long/Dialogue and Interactive Systems
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Hannah Rashkin | EmpatheticDialogues | /N |
Documentation:
https://parl.ai/
Written
Corpus,
Language Type:
Monolingual
Languages:
English
Availability:
Freely Available
License:
Size:
3855 extracted pdf files and 9584 extracted tables OtherProduction Status:
Newly created-finished
Use:
Information Extraction, Information Retrieval
-
Paper title:Identification of Tasks, Datasets, Evaluation Metrics, and Numeric Scores for Scientific Leaderboards Construction
-
Paper track:Long/Information Extraction and Text Mining
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Yufang Hou | ARC-PDN | /N |
Documentation:
None




